Indexing Arbitrary-Length k-Mers in Sequencing Reads
Identifieur interne : 001747 ( Main/Exploration ); précédent : 001746; suivant : 001748Indexing Arbitrary-Length k-Mers in Sequencing Reads
Auteurs : Tomasz Kowalski [Pologne] ; Szymon Grabowski [Pologne] ; Sebastian Deorowicz [Pologne]Source :
- PLoS ONE [ 1932-6203 ] ; 2015.
Descripteurs français
- KwdFr :
- MESH :
English descriptors
- KwdEn :
- MESH :
- genetics : Caenorhabditis elegans, Escherichia coli.
- methods : Sequence Analysis, RNA.
- statistics & numerical data : Sequence Analysis, RNA.
- Algorithms, Animals, Datasets as Topic, Genome, High-Throughput Nucleotide Sequencing, Humans, Software.
Abstract
We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating
Url:
DOI: 10.1371/journal.pone.0133198
PubMed: 26182400
PubMed Central: 4504488
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: 001009
- to stream Pmc, to step Curation: 001009
- to stream Pmc, to step Checkpoint: 000E27
- to stream PubMed, to step Corpus: 001552
- to stream PubMed, to step Curation: 001552
- to stream PubMed, to step Checkpoint: 001510
- to stream Ncbi, to step Merge: 001187
- to stream Ncbi, to step Curation: 001187
- to stream Ncbi, to step Checkpoint: 001187
- to stream Main, to step Merge: 001752
- to stream Main, to step Curation: 001747
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Indexing Arbitrary-Length <italic>k</italic>
-Mers in Sequencing Reads</title>
<author><name sortKey="Kowalski, Tomasz" sort="Kowalski, Tomasz" uniqKey="Kowalski T" first="Tomasz" last="Kowalski">Tomasz Kowalski</name>
<affiliation wicri:level="1"><nlm:aff id="aff001"><addr-line>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland</addr-line>
</nlm:aff>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź</wicri:regionArea>
<wicri:noRegion>90-924 Łódź</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Grabowski, Szymon" sort="Grabowski, Szymon" uniqKey="Grabowski S" first="Szymon" last="Grabowski">Szymon Grabowski</name>
<affiliation wicri:level="1"><nlm:aff id="aff001"><addr-line>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland</addr-line>
</nlm:aff>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź</wicri:regionArea>
<wicri:noRegion>90-924 Łódź</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Deorowicz, Sebastian" sort="Deorowicz, Sebastian" uniqKey="Deorowicz S" first="Sebastian" last="Deorowicz">Sebastian Deorowicz</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland</addr-line>
</nlm:aff>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice</wicri:regionArea>
<wicri:noRegion>44-100 Gliwice</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">26182400</idno>
<idno type="pmc">4504488</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504488</idno>
<idno type="RBID">PMC:4504488</idno>
<idno type="doi">10.1371/journal.pone.0133198</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">001009</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001009</idno>
<idno type="wicri:Area/Pmc/Curation">001009</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">001009</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000E27</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000E27</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:26182400</idno>
<idno type="wicri:Area/PubMed/Corpus">001552</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001552</idno>
<idno type="wicri:Area/PubMed/Curation">001552</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001552</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001510</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001510</idno>
<idno type="wicri:Area/Ncbi/Merge">001187</idno>
<idno type="wicri:Area/Ncbi/Curation">001187</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001187</idno>
<idno type="wicri:Area/Main/Merge">001752</idno>
<idno type="wicri:Area/Main/Curation">001747</idno>
<idno type="wicri:Area/Main/Exploration">001747</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Indexing Arbitrary-Length <italic>k</italic>
-Mers in Sequencing Reads</title>
<author><name sortKey="Kowalski, Tomasz" sort="Kowalski, Tomasz" uniqKey="Kowalski T" first="Tomasz" last="Kowalski">Tomasz Kowalski</name>
<affiliation wicri:level="1"><nlm:aff id="aff001"><addr-line>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland</addr-line>
</nlm:aff>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź</wicri:regionArea>
<wicri:noRegion>90-924 Łódź</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Grabowski, Szymon" sort="Grabowski, Szymon" uniqKey="Grabowski S" first="Szymon" last="Grabowski">Szymon Grabowski</name>
<affiliation wicri:level="1"><nlm:aff id="aff001"><addr-line>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland</addr-line>
</nlm:aff>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź</wicri:regionArea>
<wicri:noRegion>90-924 Łódź</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Deorowicz, Sebastian" sort="Deorowicz, Sebastian" uniqKey="Deorowicz S" first="Sebastian" last="Deorowicz">Sebastian Deorowicz</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland</addr-line>
</nlm:aff>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice</wicri:regionArea>
<wicri:noRegion>44-100 Gliwice</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint><date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Animals</term>
<term>Caenorhabditis elegans (genetics)</term>
<term>Datasets as Topic</term>
<term>Escherichia coli (genetics)</term>
<term>Genome</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Humans</term>
<term>Sequence Analysis, RNA (methods)</term>
<term>Sequence Analysis, RNA (statistics & numerical data)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ARN ()</term>
<term>Animaux</term>
<term>Caenorhabditis elegans (génétique)</term>
<term>Données de la recherche comme sujet</term>
<term>Escherichia coli (génétique)</term>
<term>Génome</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>Caenorhabditis elegans</term>
<term>Escherichia coli</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>Caenorhabditis elegans</term>
<term>Escherichia coli</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Sequence Analysis, RNA</term>
</keywords>
<keywords scheme="MESH" qualifier="statistics & numerical data" xml:lang="en"><term>Sequence Analysis, RNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Animals</term>
<term>Datasets as Topic</term>
<term>Genome</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Humans</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ARN</term>
<term>Animaux</term>
<term>Données de la recherche comme sujet</term>
<term>Génome</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p>We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating <italic>k</italic>
-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Gusfield, D" uniqKey="Gusfield D">D Gusfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author><name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Danek, A" uniqKey="Danek A">A Danek</name>
</author>
<author><name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S Deorowicz</name>
</author>
<author><name sortKey="Grabowski, S" uniqKey="Grabowski S">S Grabowski</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kelly, Dr" uniqKey="Kelly D">DR Kelly</name>
</author>
<author><name sortKey="Schatz, Mc" uniqKey="Schatz M">MC Schatz</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ilie, L" uniqKey="Ilie L">L Ilie</name>
</author>
<author><name sortKey="Molnar, M" uniqKey="Molnar M">M Molnar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Heo, Y" uniqKey="Heo Y">Y Heo</name>
</author>
<author><name sortKey="Wu, Xl" uniqKey="Wu X">XL Wu</name>
</author>
<author><name sortKey="Chen, D" uniqKey="Chen D">D Chen</name>
</author>
<author><name sortKey="Ma, J" uniqKey="Ma J">J Ma</name>
</author>
<author><name sortKey="Hwu, Wm" uniqKey="Hwu W">WM Hwu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schulz, Mh" uniqKey="Schulz M">MH Schulz</name>
</author>
<author><name sortKey="Weese, D" uniqKey="Weese D">D Weese</name>
</author>
<author><name sortKey="Holtgrewe, M" uniqKey="Holtgrewe M">M Holtgrewe</name>
</author>
<author><name sortKey="Dimitrova, V" uniqKey="Dimitrova V">V Dimitrova</name>
</author>
<author><name sortKey="Niu, S" uniqKey="Niu S">S Niu</name>
</author>
<author><name sortKey="Reinert, K" uniqKey="Reinert K">K Reinert</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
<author><name sortKey="Kobert, K" uniqKey="Kobert K">K Kobert</name>
</author>
<author><name sortKey="Flouri, T" uniqKey="Flouri T">T Flouri</name>
</author>
<author><name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ames, Sk" uniqKey="Ames S">SK Ames</name>
</author>
<author><name sortKey="Hysom, Da" uniqKey="Hysom D">DA Hysom</name>
</author>
<author><name sortKey="Gardner, Sn" uniqKey="Gardner S">SN Gardner</name>
</author>
<author><name sortKey="Lloyd, Gs" uniqKey="Lloyd G">GS Lloyd</name>
</author>
<author><name sortKey="Gokhale, Mb" uniqKey="Gokhale M">MB Gokhale</name>
</author>
<author><name sortKey="Allen, Je" uniqKey="Allen J">JE Allen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wood, D" uniqKey="Wood D">D Wood</name>
</author>
<author><name sortKey="Salzberg, S" uniqKey="Salzberg S">S Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bazinet, Al" uniqKey="Bazinet A">AL Bazinet</name>
</author>
<author><name sortKey="Cummings, Mp" uniqKey="Cummings M">MP Cummings</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Philippe, N" uniqKey="Philippe N">N Philippe</name>
</author>
<author><name sortKey="Salson, M" uniqKey="Salson M">M Salson</name>
</author>
<author><name sortKey="Lecroq, T" uniqKey="Lecroq T">T Lecroq</name>
</author>
<author><name sortKey="Leonard, M" uniqKey="Leonard M">M Léonard</name>
</author>
<author><name sortKey="Commes, T" uniqKey="Commes T">T Commes</name>
</author>
<author><name sortKey="Rivals, E" uniqKey="Rivals E">E Rivals</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Philippe, N" uniqKey="Philippe N">N Philippe</name>
</author>
<author><name sortKey="Salson, M" uniqKey="Salson M">M Salson</name>
</author>
<author><name sortKey="Commes, T" uniqKey="Commes T">T Commes</name>
</author>
<author><name sortKey="Rivals, E" uniqKey="Rivals E">E Rivals</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rizk, G" uniqKey="Rizk G">G Rizk</name>
</author>
<author><name sortKey="Lavenier, D" uniqKey="Lavenier D">D Lavenier</name>
</author>
<author><name sortKey="Chikhi, R" uniqKey="Chikhi R">R Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Marcais, G" uniqKey="Marcais G">G Marçais</name>
</author>
<author><name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S Deorowicz</name>
</author>
<author><name sortKey="Debudaj Grabysz, A" uniqKey="Debudaj Grabysz A">A Debudaj-Grabysz</name>
</author>
<author><name sortKey="Grabowski, S" uniqKey="Grabowski S">S Grabowski</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schroder, J" uniqKey="Schroder J">J Schröder</name>
</author>
<author><name sortKey="Schroder, H" uniqKey="Schroder H">H Schröder</name>
</author>
<author><name sortKey="Puglisi, Sj" uniqKey="Puglisi S">SJ Puglisi</name>
</author>
<author><name sortKey="Sinha, R" uniqKey="Sinha R">R Sinha</name>
</author>
<author><name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Salmela, L" uniqKey="Salmela L">L Salmela</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
<author><name sortKey="Pertea, M" uniqKey="Pertea M">M Pertea</name>
</author>
<author><name sortKey="Fahrner, Ja" uniqKey="Fahrner J">JA Fahrner</name>
</author>
<author><name sortKey="Sobreira, N" uniqKey="Sobreira N">N Sobreira</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kurtz, S" uniqKey="Kurtz S">S Kurtz</name>
</author>
<author><name sortKey="Phillippy, A" uniqKey="Phillippy A">A Phillippy</name>
</author>
<author><name sortKey="Delcher, Al" uniqKey="Delcher A">AL Delcher</name>
</author>
<author><name sortKey="Smoot, M" uniqKey="Smoot M">M Smoot</name>
</author>
<author><name sortKey="Shumway, M" uniqKey="Shumway M">M Shumway</name>
</author>
<author><name sortKey="Antonescu, C" uniqKey="Antonescu C">C Antonescu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Manber, U" uniqKey="Manber U">U Manber</name>
</author>
<author><name sortKey="Myers, G" uniqKey="Myers G">G Myers</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Maier, D" uniqKey="Maier D">D Maier</name>
</author>
<author><name sortKey="Storer, Ja" uniqKey="Storer J">JA Storer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Grabowski, S" uniqKey="Grabowski S">S Grabowski</name>
</author>
<author><name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S Deorowicz</name>
</author>
<author><name sortKey="Roguski, L" uniqKey="Roguski L">L Roguski</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations><list><country><li>Pologne</li>
</country>
</list>
<tree><country name="Pologne"><noRegion><name sortKey="Kowalski, Tomasz" sort="Kowalski, Tomasz" uniqKey="Kowalski T" first="Tomasz" last="Kowalski">Tomasz Kowalski</name>
</noRegion>
<name sortKey="Deorowicz, Sebastian" sort="Deorowicz, Sebastian" uniqKey="Deorowicz S" first="Sebastian" last="Deorowicz">Sebastian Deorowicz</name>
<name sortKey="Grabowski, Szymon" sort="Grabowski, Szymon" uniqKey="Grabowski S" first="Szymon" last="Grabowski">Szymon Grabowski</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001747 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001747 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Exploration |type= RBID |clé= PMC:4504488 |texte= Indexing Arbitrary-Length k-Mers in Sequencing Reads }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:26182400" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |